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Abstract 

We present fast algorithms for constructing probabilistic embeddings 
and approximate distance oracles in sparse graphs. The main ingredient 
is a fast algorithm for sampling the probabilistic partitions of Calinescu, 
Karloff, and Rabani in sparse graphs. 



1 Introduction 

Metric decompositions aim to partition the points of a metric space into blocks 
such that close-by points tend to be placed in the same block while distant 
pairs of points in different blocks. For most metric spaces, no straightforward 
interpretation of these goals exists. 

One successful compromise is the notion of probabilistic partition. A A- 
bounded probabilistic partition is a probability distribution over partitions of 
the metric space, such that in every partition in the distribution, the diameters 
of the blocks are at most A, while "close-by" pairs of points are in the same 
block with "high" (or at least "non-negligible" ) probability. 

Probabilistic partitions first appeared, to the best of our knowledge0 in a 
paper of Linial and Saks [TH] , and publicized in the work of Bartal [3] on proba- 
bilistic embeddings. Calinescu, Karloff and Rabani TU introduced the following 
probabilistic partition of metric spaces which we describe as an algorithm that 
samples a partition from the probability distribution. 

We call the probabilistic partition P sampled by Algorithm [1] A-bounded 
CKR partition. Naive implementations of Algorithm [1] take VL{n'^) time for n- 
point metric spaces. It seems hard to break the Vl{n?) barrier on the running 
time in general finite metric spaces. However, in many situations, the metric 
spaces we deal with come from the shortest-path metric on relatively sparse 
graphs. In those cases we can do better, as the following theorem shows. 



*M. Mendel was partially supported by an ISF grant no. 221/07, a BSF grant no. 2006009, 
and a gift from Cisco research center. This work is part of the M.Sc. thesis of C. Schwob 
prepared in the Computer Science Division of the Open University of Israel. 

^Closely related notions of partitions appeared before, e.g. in 1171 . 
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Algorithm 1 CKR-Partition 

Input: A finite metric space {X,p), scale A > 

Output: Partition P of A 

TT := random permutation of X 

R :— random number in [-^j -^j 

for i = 1 to |A| do 

a := {y e A : p(y, < R} \ U;il Q 

return P := {Ci, . . . , qj,,} \ {0} 



Theorem 1. Suppose we are given a positive number A > and an undirected 
graph with positive edge weights G = {X,E^uj). Suppose G has n vertices and 
m edges, and let p denote the shortest-path metric in G. One can sample a 
A-bounded CKR partition of{X,p) in expected 0(m\ogn + nlog'^ n) time. 

The sampling will be accomplished by Algorithmic] (Section [31). 

CKR partitions have found many algorithmic (as well as mathematical) ap- 
plications, and we mention only few of them here. They were introduced as part 
of an approximation algorithm to the 0-extension problem [TOl [12] . Fakcharoen- 
phol, Rao and Talwar [13] used them to obtain an asymptotically tight prob- 
abilistic embedding into trees, which we call FRT-embedding. Probabilistic 
embeddings are used in many of the best known approximation and online algo- 
rithms as a reduction step from general metrics into tree metrics. Mendel and 
Naor [21] showed that FRT-embedding possesses a stronger embedding prop- 
erty, which they called "maximum gradient embedding" . Recently, Racke [23] 
used FRT-embeddings to obtain hierarchical decompositions for congestion min- 
imization in networks, and used them to give an O(logn) approximation algo- 
rithm for the minimum bisection problem and an 0(log n) competitive online 
algorithm for the oblivious routing problem. Krauthgamer et. al [16] used CKR- 
partitions to give a new proof of Bourgain's embedding theorem. Mendel and 
Naor [22] used them to obtain an asymptotically tight metric Ramsey theorem 
and approximate distance oracles. 

The improved running time of the sampling of CKR partitions may improve 
the running time of many of their applications. In order to keep the paper short 
we work out the details of only two (related) applications of CKR partitions: 
FRT-embeddings, and approximate distance oracles based on CKR-partitions. 

Probabilistic embedding into ultrametrics fS] . 

An ultrametric on A is a metric which satisfies i^ix, z) < max{z^(a;, y), ^(y, z)}^ 
for every x,y, z ^ X . A probabilistic embedding of a metric space {X,p) into 
ultrametrics with distortion D is a probability distribution 11 over ultramerics 
V on X such that 

1. For every x,y & X, Prn[i^(a;,y) > p{x,y)] = 1. 

2. For every x,y E X, EYi[i^{x,y)] < D ■ p{x,y). 
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FRT-embedding is a probabilistic embedding into ultrametrics with distor- 
tion O(logn) for every n-point metric space [13]. This bound is asymptotically 
tight for certain classes of finite metric spaces, such as graphs of high girth[3], 
grids [5], and expanders ^TE\ . 

Approximate distance oracles. 

An approximate distance oracle is a data structure with "compact" (o(n^)) stor- 
age that answers (approximate) distance queries in a given n-point metric space 
in constant time. A simple counting argument over all bi-partite graphs shows 
that exact, and even 2.99 approximation is impossible when the storage is o(n^). 
The history of this problem is nicely summarized in |25| . In particular, Thorup 
and Zwick [25J gave an asymptotically tight trade-off between the approxima- 
tion and the storageH: For every fc e N, they constructed {2k — l)-approximate 
distance oracle requiring 0{kn^~^^^'') storage, and answering queries in 0{k) 
time. Recently Mendel and Naor [22] presented different approximate distance 
oracles, based on CKR partitions. While those oracles do not give optimal 
approximation/storage trade-off H they answer distance queries in an absolute 
constant time, regardless of the approximation parameter. 

Theorem 2. Let G = {X,E,lo) be an n-vertex weighted graph with m edges, 
and let p be the shortest-path metric on X . Then 

1. It is possible to sample from FRT-embedding of (X,p) in 0(m log^n) ex- 
pected time. 

2. It is possible to construct in 0{mn^/^ log'^ n) expected time an 0{k)- approximate 
distance oracle for {X, p) based on CKR partitions whose storage is 0(n^+^/'^'). 

For approximate distance oracles, it is also possible to improve the nai've 
construction time even when the metric is given as distance matrix, by first 
constructing a spanner of the metric with o(n^) edges, and then use the fast 
CKR partitions for sparse graphs on that spanner. 

Theorem 3. For n-point metric spaces given as distance matrix, it is possible 
to construct 0{k) -approximate distance oracle based on CKR partitions whose 
storage is 0{n^^^^''), in Oin?) expected time. 

We remark that a different probabilistic partition, developed by Bartal [HIS] 
and Abraham et. al. [1], have properties similar to (and even stronger than) CKR 
partitions. However, we do not see an easy way to quickly obtain a sample from 
this distribution when the graph is sparse. 

^The lower bound on the approximation assumes a conjecture of Erdos about the number 
of edges possible in graph with a given number of vertices and a given girth, see |25j . 

•^As reported in |22l . oracles of size 0(n^"'"^/*) support approximation factor of 128fc in 
the queries. While the constant 128 can be reduced by optimizing the parameters in the 
construction, it is unlikely to get below 8. 
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Further Results 

The second named author presents in [53] an efficient PRAM algorithm for sam- 
phng CKR partitions and constructing approximate distance oracles in weighted 
graphs. The running time of the algorithm is polylog(n) and the total work is 
0{m polylog(n) ) . 

Outline of the paper. 

After setting up in Section [2] the notation and reviewing the properties of CKR 
partitions, we prove Theorem [T] in Section [31 

In many applications (and in particular, probabilistic embeddings and ap- 
proximate distance oracles), probabilistic partitions are applied hierarchically, 
using an exponentially decreasing series of scales. This naivly implies an added 
0(log<i>) factor in the running time, where $ is the spreac0 of the metric. 
There is a standard technique that converts this log$ factor into a logn fac- 
tor. However, we are not aware of a concrete implementation that satisfies the 
efficiency requirements needed in this paper. We therefore sketch the details 
of this technical step in Section [4] The specific applications examined in this 
paper. Theorem [2] and Theorem [Sj are discussed in Section [5l 

2 Preliminaries 

For simplicity of the presentation, the model of computation we assume is a 
unit-cost, real-word RAM machine. In this model words can hold real numbers 
and arithmetic, comparison, and truncation operations take unit time. Our 
algorithms, however, do not take advantage of the unrealistic power of this 
model, and can also be presented in a more realistic computational models such 
as the unit cost fioating-point word RAM model (cf. [15j Sec. 2.2]). 

The diameter of a finite subset Y C [X, p) is defined as diam(y) — max{/9(a;, y) : 
x,y G Y}. For simplicity of the presentation, we assume that the given finite 
metric {X,p) has minimum non-zero distance 1, and diameter diam(X) = 

The (closed) ball around x at radius r is defined as Bp{x,r) = {y E X : 
pix, ?/)<?'}• When p is clear from the context we may omit it from the notation. 
Given a partition P of X, and x € X, we denote by P{x) the block of P which 
contains x. 

A-boundcd CKR partitions have an obvious upper bound of A on the diam- 
eter of the blocks in the partition. The following is the padding property they 
enjoy. 

Lemma 2.1 f |13l Let P be a A-bounded CKR partition of the metric 

(A", p). Then, for every x G A, and t < A/8, 



*The spread is the ratio between the diameter and the smallest non-zero distance in the 




(1) 



metric 
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In this paper we define hierarchical partition of a metric space {X, p) as 
a sequence of [logg $] + 2 partitions P_i, Pq) ■ • ■ , ^'fiogg *l such that Pi is a 
partition of X at scale 8*, and Pi is a refinement of Pj when i < j, i.e., for every 
X € X, Pi{x) C Pj{x). Given a sequence {Qj)j>^i of partitions, where Qj is 
S-'-bounded partition of X, the common refinement of {Qj)j is a hierarchical 
partition {Pj)j>-i in which P, = {nf>j : Ci £ Qi\ 

By sampling stochastically independent CKR partitions at the different 
scales and then taking their common refinement, we obtain the following re- 
sult. 

Lemma 2.2 ([22|). Fix a finite metric space {X,p). Then there exists (effi- 
ciently sampleable) probability distribution 7i over hierarchical partitions such 
that for every x £ X , and every < /3 < 1/8, 

Pr [V fc > -1, B{x, 138'') C Pk{x)] > IXp^'^'' . 

A finite ultrametric {X, v) can be represented by a tree as follows. 

Definition 4. An ultrametric tree (T, F) is a metric space whose elements are 
the leaves of a rooted finite tree T . Associated with every vertex u G T is a 
label r (u) > such that T (u) = iff u is a leaf of T. If a vertex it is a child 
of a vertex v then F (u) < F (w) . The distance between two leaves x,y G T is 
defined as F (lea {x, y)), where lea (x, y) is the least common ancestor of x and 
y in T. 

Every finite ultrametric can be represented by an ultrametric tree, and vice 
versa: the metric on ultrametric tree is a finite ultrametric. Hierarchical parti- 
tion {Pk}k^f^i of {X, p) naturally corresponds to an ultrametric v on X where 
v[x,y) — 8'"'"^^- Pi'y^)=Piiy)} . 

Let G — {X, E, uj) be an undirected positively weighted graph. Let p : 
X X X ^ [0, cxo) be the shortest-path metric on G. We denote by n = \X\ 
the number of vertices, and by m = \E\ the number of edges. We assume an 
adjacency list representation of graphs. 

The single source shortest paths in weighted undirected graphs problem 
[USSSP] is used as a subroutine in our algorithm. Given a weighted graph 
with n vertices and m edges, Dijkstra's classical USSSP algorithm [11] with 
source w maintains for each vertex v an upper bound on the distance between 
w and V, S{v). If S (v) has not been assigned yet, it is interpreted as infi- 
nite. Initially, we just set S (w) = 0, and we have no visited vertices. At each 
iteration, we select an unvisited vertex u with the smallest finite S (u) , visit 
it, and relax all its edges. That is, for each incident edge {u,v) £ E, we set 
d (v) ^ mm{S (v) ,S (u) + UJ {u,v)}. We continue until no vertex is left un- 
visited. Using Fibonacci heaps [2] or Bordal's priority queues [5], Dijkstra's 
algorithm is implemented in O (m -I- nlgn) time. 
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3 Fast CKR partitions 



Given an undirected positively weighted graph G — {X, E, u) with n vertices 
and TO edges whose shortest path metric is denoted by p, and A > 0, we show 
how to implement Algorithm [1] in O (to \gn + n log^ rt) expected time. 

First, we sample a random permutation tt, which can be generated in linear 
time using several methods, e.g., Knuth Shuffle (see [5]). Next, we sample R 
uniformljO in the range -f-] • 

We then use a variant of Dijkstra's algorithm for computing the blocks. 
The algorithm performs \X\ iterations. In the i-th iteration, all vertices in 
Bp (a;^(i),i?) not yet assigned to some block are put in Ci. In order to gain 
the improved running time of Theorem [l] we change Dijkstra's algorithm to 
return the distance of a point v from n(i) only if this distance is smaller then 
the distance of v from 7r(j') for all j < i. 

Technically, this is done as follows. Consider the i-th iteration and let S{v) 
be the variable that holds the Dijkstra's algorithm's current estimate on the 
distance between 7r(i) and v. In Dijkstra's algorithm S{v) is usually initialized 
to oo and then gradually decreases until u is extracted from the priority queue, 
at which point d{v) — p(Tr{i),v). In the variant of Dijkstra's algorithm used 
in Algorithm [21 S{-) are not reinitialized when the value of i is changed. This 
means that now at the end of the (i — l)-th iteration, 6{v) = minj<i p(vr(j), w), 
which in turn means that an edge {u, w) is relaxed in the i-th iteration only 
when 7r(i) is the closest center to both u, and w among 7r(j), j < i. This 
dramatically reduces the number of relaxations being done, and does not hurt 
the correctness of the algorithm. The full details are given in Algorithm [2l 

Lemma 3.1. After the i-th iteration of the loop on lines FS HT^ of Algorithm\^ 

5{v) = /"^^'^J^* P(i'(i)> v) if minj<i p(7r(j), v) < R, 
1 oo otherwise. 

Sketch of a proof. Proof by induction on i. When i = 1, and as long 5{w) < Rin 
the While loop of linefTUl the algorithm behaves exactly as Dijkstra's algorithm 
and hence ([2]) is true for i = 1. 

Assume inductively that ([2|) is correct for i — 1. If miiij<i-i p(Tr(j),v) < 
p{TT{i), v), then clearly the i-th iteration will not change S{v), and by the induc- 
tiion hypothesis we are done. 

Assume now that mhij<i^i p{Tr{j),v) > p{Tr{i),v), and p(7r(i),v) < R. Let 
7r(i) = vo,vi, . . . ,Vi = V be a shortest-path between 7r(i) and v. We claim that 
for every t e {!,...,£}, minj<i_i /ci(7r(j), wt) > p('n(i),vt), since otherwise we 
had 

inin p(7r(j),w)< rnin p{T:{j),Vt) + p{vt,v) < p(T:(i),Vt) + p{vt,v) ^ p{'K{i),v). 

closer look on the analysis of the CKR partitions (see [22]) reveals that it is sufficient 
to sample R from discrete distribution having resolution of A/clogn, and therefore this step 
can be carried out in a "realistic" computational model such as the unit cost floating-point 
word RAM model. 
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Algorithm 2 Graph-CKR-Partition 



Input: Graph G = {X, E,uj), scale A > 
Output: Partition P of X 
1: Generate random permutation -k oi X 
Sample a random i? G [-j-i t] 
for all u e X do 
5(v) :— oo 
P (v) := 

for i := 1 to |X| do / / Perform modified Dijkstra's alg starting from n (i) 
S{Tr{i)) := 

Q '■— % 1 1 Q is a priority queue with 5 being the key 

W '■— TT (i) 

while S (w) < R do / / w is visited now 
if P{w) = then 

P (w) i 
for all u : {u,w) ^ E do 

if 6(u) > S{w) + oj{u, w) then / / Relax edges adjacent to w 
S (u) 6 (w) + u) (u, w) 
if u ^ Q then 

Insert u into Q 
Extract w (z Q with minimal 5 (w) 
return P 



Hence all the edges along the path Tr(i) = vq, ■ ■ ■ ,ve ^ v will be relaxed in the 
i-th iteration, and so in the end of the z-th iteration, S{v) = p{'K{i), v). □ 

Proof of Theorem]^ We first prove the correctness of Algorithm [21 i.e., that 
P {v) = min {i : p(7r(i), v) < R} for every v € V. Let iq = min {i : p{Tr{i), v) < R}. 
This means that minj<i„ /9(7r(j), w) > R> p{'K(iQ),v). By Lemma 13.11 at the 
beginning of the io-th iteration, S{v) — oo, and hence P{v) = and by the end 
of the (io)-th iteration, 5{v) = p(7r(io),w), and necessarily P{v) — ig. Note that 
once P{v) is set to a non-zero value, its value will not change. 

We next bound the running time, we will show that every vertex is in- 
serted into the queue O(logn) times in expectation, and every edge {u,v) of 
G undergoes O (logn) relaxations in expectation. Consider the non-increasing 
sequence — minj^i p (tt (j) ,v). In the i-ih iteration, S{v) decreases if and 
only if ai-i > a,;. Note that ai_i > Oi means that p{'k{i) ,v) is the minimum 
among {p (tt (j) , w) | j < i}, and the probability (over tt) for this to happen is 
at most By linearity of the expectation, the expected number of rounds of 
the i-loop where 5{v) decreases (and hence v is inserted into the queue) is at 
most 

n ^ 

- < 1 + In n. 

i=l * 

Furthermore, by another application of the linearity of expectation, the expected 
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number of edge relaxations is at most 

O ^ In n ■ dog {v)j — O [m log n) . 

Using Fibonacci heaps [M] or Brodal's priority queues [9] , the total running 
time of Algorithm [2] is 0{r + s logn), where r is the number of relaxations, and 
s is the number of "insert" and "extract minimum" operations. In our cases 
E[r] = O(mlogn), and E[s] — 0{n\ogn). Therefore the total expected running 
time of Algorithm [2] is 0{m log n + n log^ n). □ 



4 Hierarchical Partitions 

In this section we explain how to dispense with the O (log $) factor in the nai've 
implementation of the hierarchical partitions, and replace it with O (logn). The 
method being used is standard. Similar arguments appeared previously, e.g., 
in [21 [T31 mi m] . However, the context here is slightly different, and the desig- 
nated time bound is O {m log"^ n) , which is faster than the implementations we 
are aware of. While the argument is relatively straightforward, a full description 
of it is tedious to write and read. Instead we only sketch the implementation 
here. A complete description, including algorithmic implementation, appears 
in [21]. 

In a naive implementation, the number of scales in which we sample CKR 
partitions is 9 (lg<i>). This leads to 0((nlog^ n + mlogn) log<i>) bound on the 
expected running time. Here we develop an implementation having O (m log'^ n) 
expected running time. We define for each scale an appropriate quotient of the 
input graph. We then show that CKR partitions of those substitutive graph 
metrics retain the properties of CKR partitions on original metric. Using those 
quotients, not all scales need to be processed, and the total size of the quotient 
graphs in all processed scales is O (m Ig n) . 

For y,y' C A", let p (?/, y') = min { p (x, x') \x d y,x' d y'}. Given a partition 
Y of the space (A, p) we define the quotient metric on y as 

I 

i^(2/,2/') =min{X!^(^-J-i'2^j) " 2/0, ■ • ■ , 2/z G 2/0 = 2/, 2/z = 2/' j- 

Definition 5. A space {Y, u) is called A-bounded quotient of an n-point metric 
space (A, p) if y is a A-bounded partition of A, i/ is a quotient metric on Y, 
and for every x £ A, Bp{x, A/n) C Y{x). 

Note that a A-bounded quotient of n-point metric space exists: define a 
relation x ~ a;' if p{x,x') < A/n, and take the transitive closure. The quotient 
subsets are the equivalence classes, and by the triangle inequality, the diameter 
of those equivalence classes is at most A. 

The following lemma follows easily from Lemma 12. 1[ see the proof of [20l 
Lemma 5]. 
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Lemma 4.1. Fix A > 0, and let {Y, v) he a ^-hounded quotient of (X, p). Let 

a : X ^ Y he the natural projection, assigning each vertex x G X to its cluster 
Y (x). Let L he a [1^/2) -bounded CKR partition ofY. 

Let P be the pullhack of L under a, i.e., P = a^^ (A) | v4 e L}. Then P is 
a A-hounded partition of X such that for every < t < A/16 and every x G X , 

and furthermore, if t < A/2n, then 

Vv[Bp{x,t)^P{x)] = l. (4) 

We define GIa as the subgraph of G with edges of weight at most A and no 
isolated vertices. 

Definition 6. Given a weighted graph G = {X,E,lo) and A > 0. Define the 
graph G|a = {X\\, E\^,uj\i:^) as follows. 

E\a~ {{u, v) e E : u ^ V, and lu (u, v) < A} , 
X\A^{ueX : 3v e X, {u,v) e E\a} , and 

Lemma 4.2. Given a weighted graph G = {X,E,u!), and A > 0. Let L he 

a A-hounded CKR partition of X\/\, using the metric induced hy G|a- Then 
P — LU{{v} : V ^ X \ {X\a)} is a A-bounded CKR partition of X , using the 
metric induced hy G. 

Proof. Let p be the shortest-path metric on G. Observe that when computing a 
A-bounded CKR partition of {X, p) no edge of weight larger than A is "used" by 
the Dijkstra's algorithm for computing the balls, and therefore discarding them 
does not change the behavior of the algorithm. Also, for each v e X \ X\a, 
Bp {v, A) = {v}, i.e., in any A-bounded CKR partition of X , v will appear in a 
singleton subset. □ 

Lemma f4.2l and Lemma [4.1l form the basis for dispensing with the dependence 
on the spread in the construction time. We next sketch the scheme we use. 
Denote the input graph G = (X,E,u!), \X\ — n, \E\ — m, and let p be the 
graph metric on G. 

We first construct an ultrametric v on V , represented by an ultrametric tree 
H = (T, r) such that for every u,v ^V, p{u,v) < v{u,v) < np{u,v). H can 
be constructed in 0{m -\- nlogn) time using minimum spanning tree procedure, 
see [ini Section 3.2]. For a given A > 0, and a leaf v € T, denote by cta {v) 
the highest ancestor u of v for which F (u) < ^ . Using the level-ancestor data 
structure (cf. [7]) the tree T can be preprocessed in 0{n) time such that queries 
for crA(w) (given A, and v) are answered in O(logri) time. See |151 Section 3.5] 
for a similar supporting data structure. 
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Given A > 0, define the weighted graph G'(a) as follows. G(a) = (-'^(A) , E(i^) , u 
where, 

^(A) = {c^A (v): veX}, 

E{A) = {(c^A (u) , CTA (v)) ■■ {u, v) e E, CTA (w) 7^ c^A (v)} , 
UJ[A} {u, v) ~ min {u (w, z) : cta {w) — u, cta (z) — v} . 

Let p(A) be the shortest-path metric on G(a)- Then, directly from the 
definitions, (^(a)iP(A)) is a -^"bounded quotient of {X,p). 

For an integer j > — 1 denote Gj = {Vj , Ej ,ujj) where Gj = (Gfgj)) \sj /2- 
The following lemma gives an upper bound on the total size of the graphs Gj . 

Lemma 4.3. 

E im + \E,\)=0{m\gn). 

Proof. Fix £ i? and j > —1 such that (crgj (u) , crgj (w)) e £■(8^)- By 

the definition of £'(8^), ctbj (''^) 7^ ^8^ (^)- By the definition of w^gjj, a; > 
t^(si) {u,v) > 1^. Also, (cTgj (u) jCTgj (w)) G Ej if andonlyifw(8j) (crgj (w) , CTsj (^^)) 

So by the triangle inequality w) < (i;(8j) (crgj (m), crgj (w)) + S-' , and hence 
uj{u,v) < 1.5 • 8^. That is, each edge of G is represented in Gj only when 

w (w, w) g 1^, 1.5 • S-' . A total of O (logn) scales. By definition, Gj contains 

only non-isolated vertices, so Vj, \Vj \ < 2 \Ej\. □ 

Let Processed = {j > -1 : Vj ^$}. 

Lemma 4.4. The set of graphs (Gj)jgprocessed can be constructed in 0(rn log^ n) 
expected time^ 

Sketch of a proof. First we sort the edges in i5 = {ei, . . . Cm} in non increas- 
ing order. Keep a "shding window" [iL(i)i ^Lit),iR{t) G {1,...,to}, 
t e {1, I Processed |}, as follows: Let ji = [logg<i>]. — 1, = 

max{i : i-u{ei) > 8^^/2n}. Assuming jt-i is already defined, define jt = 
maxjj < jt-i : 3i, 8^ > Lu{ei) > 8^ /2n}, iL{t) = min{i : tj(e,) < S-?*}, 
and = max{i : u){ei) > 8^*/2n}. Note that {jt}t — Processed, and the 

definition gives 0{m) time algorithm for computing the sequences (jt)t, {iL{t))t, 
and {iR{t))t. Constructing Gj^ can now be done in 0{{iii{t) — ihit) + l)logn) 
time, by observing that the set of vertices is 

= {o-8Jt("i),cr8«(wO : « e {iL{t),...,iR{t)}, {ui,v,) = ej, 

and similarly the set of edges is 

Ejt ^ {{(Js't{ui),crgn{vi)) : « G {^(i), . . . , {u^,v^)^ei}. 

Another logn factor in the construction time comes from the O(logn) time 
needed for each query of the form "(Ta(u)". Since iR{t) — ihit) + 1 = \Ej^\^ by 
LemmaSSl J^ti^RW - 'i'dt) + 1) = 0(rn logn). □ 



^With a bit more care the running time can be improved to 0(m log n). This improvement, 
however, will not improve the total construction time of the hierarchical partition. 
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Next, we sample (8'''/2)-bounded CKR partition Lj_^ for each Gj^. By 
Lemma |4.1[ {Lj^)t (implicitly) represents CKR partitions of G in all scales. 
Using Theorem [T] and Lemma 14.31 computing {Lj^)t is done in O(mlog^n) ex- 
pected time. 

Hierarchical partitions have an 0{n) storage representation. It is similar to 
an efficient ultrametric tree representation, such as the nettree in |15j . Using a 
rooted tree P whose leaves correspond to the points of X, each internal vertex 
u has at least two children, and is labeled with a (logarithm of) scale, s{u). 
The 8^ -bounded partition Pj is now defined as follows: For x £ X, Pj{x) is the 
highest ancestor u of x inP such that s{u) < j. Since the tree P does not have 
vertices of degree 2, except maybe the root, its size is 0{n). 

We are left to describe how to compute the the common refinement of the 
puUbacks of {Lj^)t as a hierarchical partition represented in the tree structure 
P of the previous paragraph. This is done by top-down fashion as follows: 

In the initialization step, P is created as a rooted star whose root, r is labeled 
by S-'^ and its leaves correspond to {a^ji (u) : u £ X}. 

Next, inductively assume that 'P is a hierarchical partition of {cgji-i (u) : 
u e X} corresponding to {Lj^ : s < t — 1}. We refine P to include Lj^ as 
follows: 

• Replace the leaves of P: Each crgjt_i (u) is replaced by 

{cTgit (v) : V G X, fjgjt (v) is a descendant of (Jg^n-i (u)}- 

This step is done in 0(|VjJ) time by simply starting from an "old leaf 
CTgjt-i (u) as a vertex in T and descending in T to level 8^*/2n. 

• Next, incorporate Lj^ into the hierarchical partition in a straightforward 
way: Scan the leaves of P, which are in Vj^ grouped by their parents. 
Fixing such a parent u whose children vi, . . . ,V£ are all leaves, we partition 
vi, . . . ,Vi to subsets {{vi, . . . , vg} n C : C G Lj^}. For every such subset 
of size 2 or more we define a new parent w (which will be a child of u) 
with the label S-'' . 

Hence, the t-th iteration in the algorithm above is executed in 0(|T/,J) time, 
so the total time for constructing the common refinement is 0(m log n). 



5 Applications 

Proof of the first part of Theorem\^ As observed in Section [21 hierarchical par- 
titions correspond to ultrametrics. As shown in [13] . when the partition in every 
scale is a CKR partition, the resulting distribution over ultrametrics is a prob- 
abilistic embedding with O(logn) distortion0 The algorithm described in Sec- 
tion[4]samples a hierarchical partition (and hence an ultrametric) in 0(rn log'^ n) 
expected time. □ 

^Technically, in 1131 the hierarchical partition was built differently: instead of taking a CKR 
partition of the whole space in every scale, and then the common refinement, at each scale 
they took many CKR partitions, one for each block of the partition of the previous scale. This 
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Proof of the second part of Theorem A point x € X is called /3-paddcd in 
hierarchical partition H = (P_i, . . . , Piog*), if for every j, B{x, (38^) C Pj{x). 

The main part of constructing 0(/3~^)-approximate distance oracle based 
on CKR partitions works as follows [21]: Set Xq — X, and iteratively on i = 
0,1,... do: Compute a hierarchical CKR partition Hi of Xi, and obtain an 
ultrametric Hi from Hi- Let Yi be a set of /3-padded points in TCi that is found 
in Lemma [2.21 Set Xi+i := Xi\Yi, i := i + 1 and repeat until Xi = 0. The set 
of ultrametrics {Hi)i, together with some supporting data-structures constitute 
the approximate distance oracle. 

By Lemma E|yi| > the number of iterations until = is 

in expectation at most 0{/3~^n^^^^), and hence the total storage is as claimed. 

There are two issues in the construction of {Hi)i that we have not covered 
yet: First, the task is to sample a hierarchical partition of Xi which is only a 
subset of the vertices in the graph G — {X, E, ui). This is rather easy to handle 
by adapting the algorithms in Section [3] and Section |4] to work with subsets of 
the vertices. 

The second issue is the computation of /3-padded points. The /9-padded 
points of a (single) A-bounded partition P of a weighted graph G = {V, E, uj) 
can be computed as follows: Add a new vertex sq. For every edge {u,v) € E 
such that P{u) ^ P{v), add an edge (so,w) whose weight is uj{u,v). Execute 
Dijkstra's shortest path algorithm from Sq, and delete all vertices at distance at 
most /3A from sq. This can be implemented in 0{m + nlogn) time. Note that 
in the hierarchical partition if a point is not in Vj then it is padded at scale 8^ . 
Hence in order to compute a f3 padded point set in hierarchical partition, for 
every t, we cross out the points which are not 2/3-padded in Lj^. The remained 
points are /3-padded in the puUbacks of {Lj_i_)t (as follows from Lemma |4T|) and 
hence also in the hierarchical partition. When implemented carefully, this can 
be done on every graph Gj in 0{\Ej\ + \Vj \ logn), and by Lemma [4?3l in a total 
0(rn log^ n) time. □ 

A i-spanner of a weighted graph G — iV^E.u), is a subset of the edges 
E' C E such that the shortest-path metric on {V, E' ,uj\e') is at most t times 
the shortest-path metric on G. We need the following result. 

Theorem 7 ([B]). Let G = {V,E,w) be a weighted graph with n vertices and 
m edges, and let k > 1 be an integer. A {2k — \)-spanner of with O (fcn^+^/'^) 
edges can be computed in O {km) expected time. 

Proof of Theorem\3l By Theorem [71 given an n-point metric space {X,p), a 5- 
spanner H of {X,p) with 0{n^/^) edges can be constructed in 0{n^) time. We 
next apply the second part of Theorem [5] whose running time is o(ri^). □ 

technicality is inconsequential for probabilistic embeddings. However, for the construction of 
approximate distance oracles, the approach of 13 does not seem to work since it does not 
have stochastically independent partitions in the different scales. See also I22| . 
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